{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# Using RouteLLM to Optimize LLM Usage\n",
        "\n",
        "RouteLLM is a flexible framework for serving and evaluating LLM routers, designed to maximize performance while minimizing cost.\n",
        "\n",
        "Key features:\n",
        "\n",
        "* Seamless integration — Acts as a drop-in replacement for the OpenAI client or runs as an OpenAI-compatible server, intelligently routing simpler queries to cheaper models.\n",
        "\n",
        "* Pre-trained routers out of the box — Proven to cut costs by up to 85% while preserving 95% of GPT-4 performance on widely used benchmarks like MT-Bench.\n",
        "\n",
        "* Cost-effective excellence — Matches the performance of leading commercial offerings while being over 40% cheaper.\n",
        "\n",
        "* Extensible and customizable — Easily add new routers, fine-tune thresholds, and compare performance across multiple benchmarks.\n",
        "\n",
        "In this tutorial, we’ll walk through how to:\n",
        "\n",
        "* Load and use a pre-trained router.\n",
        "\n",
        "* Calibrate it for your own use case.\n",
        "\n",
        "* Test routing behavior on different types of prompts."
      ],
      "metadata": {
        "id": "83-7A_sLXS-S"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Installing the dependencies"
      ],
      "metadata": {
        "id": "rMVUz7qEXqf8"
      }
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 1000
        },
        "id": "syd63lxzVvG4",
        "outputId": "8833ba76-12bb-4e8c-fa1e-bb0ffbc29f0d",
        "collapsed": true
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Collecting routellm[eval,serve]\n",
            "  Downloading routellm-0.2.0-py3-none-any.whl.metadata (14 kB)\n",
            "Requirement already satisfied: pyyaml in /usr/local/lib/python3.11/dist-packages (from routellm[eval,serve]) (6.0.2)\n",
            "Requirement already satisfied: pydantic in /usr/local/lib/python3.11/dist-packages (from routellm[eval,serve]) (2.11.7)\n",
            "Requirement already satisfied: numpy in /usr/local/lib/python3.11/dist-packages (from routellm[eval,serve]) (2.0.2)\n",
            "Requirement already satisfied: pandas in /usr/local/lib/python3.11/dist-packages (from routellm[eval,serve]) (2.2.2)\n",
            "Requirement already satisfied: torch in /usr/local/lib/python3.11/dist-packages (from routellm[eval,serve]) (2.6.0+cu124)\n",
            "Requirement already satisfied: scikit-learn in /usr/local/lib/python3.11/dist-packages (from routellm[eval,serve]) (1.6.1)\n",
            "Requirement already satisfied: tqdm in /usr/local/lib/python3.11/dist-packages (from routellm[eval,serve]) (4.67.1)\n",
            "Requirement already satisfied: openai in /usr/local/lib/python3.11/dist-packages (from routellm[eval,serve]) (1.99.1)\n",
            "Requirement already satisfied: transformers in /usr/local/lib/python3.11/dist-packages (from routellm[eval,serve]) (4.55.0)\n",
            "Requirement already satisfied: datasets in /usr/local/lib/python3.11/dist-packages (from routellm[eval,serve]) (4.0.0)\n",
            "Collecting litellm (from routellm[eval,serve])\n",
            "  Downloading litellm-1.75.4-py3-none-any.whl.metadata (40 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m40.8/40.8 kB\u001b[0m \u001b[31m1.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: matplotlib in /usr/local/lib/python3.11/dist-packages (from routellm[eval,serve]) (3.10.0)\n",
            "Collecting pandarallel (from routellm[eval,serve])\n",
            "  Downloading pandarallel-1.6.5.tar.gz (14 kB)\n",
            "  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "Collecting sglang (from routellm[eval,serve])\n",
            "  Downloading sglang-0.4.10.post2-py3-none-any.whl.metadata (27 kB)\n",
            "Requirement already satisfied: tiktoken in /usr/local/lib/python3.11/dist-packages (from routellm[eval,serve]) (0.10.0)\n",
            "Requirement already satisfied: fastapi in /usr/local/lib/python3.11/dist-packages (from routellm[eval,serve]) (0.116.1)\n",
            "Collecting shortuuid (from routellm[eval,serve])\n",
            "  Downloading shortuuid-1.0.13-py3-none-any.whl.metadata (5.8 kB)\n",
            "Requirement already satisfied: uvicorn in /usr/local/lib/python3.11/dist-packages (from routellm[eval,serve]) (0.35.0)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.11/dist-packages (from datasets->routellm[eval,serve]) (3.18.0)\n",
            "Requirement already satisfied: pyarrow>=15.0.0 in /usr/local/lib/python3.11/dist-packages (from datasets->routellm[eval,serve]) (18.1.0)\n",
            "Requirement already satisfied: dill<0.3.9,>=0.3.0 in /usr/local/lib/python3.11/dist-packages (from datasets->routellm[eval,serve]) (0.3.8)\n",
            "Requirement already satisfied: requests>=2.32.2 in /usr/local/lib/python3.11/dist-packages (from datasets->routellm[eval,serve]) (2.32.3)\n",
            "Requirement already satisfied: xxhash in /usr/local/lib/python3.11/dist-packages (from datasets->routellm[eval,serve]) (3.5.0)\n",
            "Requirement already satisfied: multiprocess<0.70.17 in /usr/local/lib/python3.11/dist-packages (from datasets->routellm[eval,serve]) (0.70.16)\n",
            "Requirement already satisfied: fsspec<=2025.3.0,>=2023.1.0 in /usr/local/lib/python3.11/dist-packages (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets->routellm[eval,serve]) (2025.3.0)\n",
            "Requirement already satisfied: huggingface-hub>=0.24.0 in /usr/local/lib/python3.11/dist-packages (from datasets->routellm[eval,serve]) (0.34.3)\n",
            "Requirement already satisfied: packaging in /usr/local/lib/python3.11/dist-packages (from datasets->routellm[eval,serve]) (25.0)\n",
            "Requirement already satisfied: starlette<0.48.0,>=0.40.0 in /usr/local/lib/python3.11/dist-packages (from fastapi->routellm[eval,serve]) (0.47.2)\n",
            "Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.11/dist-packages (from fastapi->routellm[eval,serve]) (4.14.1)\n",
            "Requirement already satisfied: annotated-types>=0.6.0 in /usr/local/lib/python3.11/dist-packages (from pydantic->routellm[eval,serve]) (0.7.0)\n",
            "Requirement already satisfied: pydantic-core==2.33.2 in /usr/local/lib/python3.11/dist-packages (from pydantic->routellm[eval,serve]) (2.33.2)\n",
            "Requirement already satisfied: typing-inspection>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from pydantic->routellm[eval,serve]) (0.4.1)\n",
            "Requirement already satisfied: aiohttp>=3.10 in /usr/local/lib/python3.11/dist-packages (from litellm->routellm[eval,serve]) (3.12.15)\n",
            "Requirement already satisfied: click in /usr/local/lib/python3.11/dist-packages (from litellm->routellm[eval,serve]) (8.2.1)\n",
            "Requirement already satisfied: httpx>=0.23.0 in /usr/local/lib/python3.11/dist-packages (from litellm->routellm[eval,serve]) (0.28.1)\n",
            "Requirement already satisfied: importlib-metadata>=6.8.0 in /usr/local/lib/python3.11/dist-packages (from litellm->routellm[eval,serve]) (8.7.0)\n",
            "Requirement already satisfied: jinja2<4.0.0,>=3.1.2 in /usr/local/lib/python3.11/dist-packages (from litellm->routellm[eval,serve]) (3.1.6)\n",
            "Requirement already satisfied: jsonschema<5.0.0,>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from litellm->routellm[eval,serve]) (4.25.0)\n",
            "Collecting openai (from routellm[eval,serve])\n",
            "  Downloading openai-1.99.6-py3-none-any.whl.metadata (29 kB)\n",
            "Collecting python-dotenv>=0.2.0 (from litellm->routellm[eval,serve])\n",
            "  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 kB)\n",
            "Requirement already satisfied: tokenizers in /usr/local/lib/python3.11/dist-packages (from litellm->routellm[eval,serve]) (0.21.4)\n",
            "Requirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.11/dist-packages (from openai->routellm[eval,serve]) (4.10.0)\n",
            "Requirement already satisfied: distro<2,>=1.7.0 in /usr/local/lib/python3.11/dist-packages (from openai->routellm[eval,serve]) (1.9.0)\n",
            "Requirement already satisfied: jiter<1,>=0.4.0 in /usr/local/lib/python3.11/dist-packages (from openai->routellm[eval,serve]) (0.10.0)\n",
            "Requirement already satisfied: sniffio in /usr/local/lib/python3.11/dist-packages (from openai->routellm[eval,serve]) (1.3.1)\n",
            "Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.11/dist-packages (from tiktoken->routellm[eval,serve]) (2024.11.6)\n",
            "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib->routellm[eval,serve]) (1.3.3)\n",
            "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.11/dist-packages (from matplotlib->routellm[eval,serve]) (0.12.1)\n",
            "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.11/dist-packages (from matplotlib->routellm[eval,serve]) (4.59.0)\n",
            "Requirement already satisfied: kiwisolver>=1.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib->routellm[eval,serve]) (1.4.8)\n",
            "Requirement already satisfied: pillow>=8 in /usr/local/lib/python3.11/dist-packages (from matplotlib->routellm[eval,serve]) (11.3.0)\n",
            "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.11/dist-packages (from matplotlib->routellm[eval,serve]) (3.2.3)\n",
            "Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.11/dist-packages (from matplotlib->routellm[eval,serve]) (2.9.0.post0)\n",
            "Requirement already satisfied: psutil in /usr/local/lib/python3.11/dist-packages (from pandarallel->routellm[eval,serve]) (5.9.5)\n",
            "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.11/dist-packages (from pandas->routellm[eval,serve]) (2025.2)\n",
            "Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.11/dist-packages (from pandas->routellm[eval,serve]) (2025.2)\n",
            "Requirement already satisfied: scipy>=1.6.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn->routellm[eval,serve]) (1.16.1)\n",
            "Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn->routellm[eval,serve]) (1.5.1)\n",
            "Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.11/dist-packages (from scikit-learn->routellm[eval,serve]) (3.6.0)\n",
            "Requirement already satisfied: IPython in /usr/local/lib/python3.11/dist-packages (from sglang->routellm[eval,serve]) (7.34.0)\n",
            "Collecting setproctitle (from sglang->routellm[eval,serve])\n",
            "  Downloading setproctitle-1.3.6-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (10 kB)\n",
            "Requirement already satisfied: networkx in /usr/local/lib/python3.11/dist-packages (from torch->routellm[eval,serve]) (3.5)\n",
            "Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch->routellm[eval,serve])\n",
            "  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch->routellm[eval,serve])\n",
            "  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch->routellm[eval,serve])\n",
            "  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n",
            "Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch->routellm[eval,serve])\n",
            "  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n",
            "Collecting nvidia-cublas-cu12==12.4.5.8 (from torch->routellm[eval,serve])\n",
            "  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-cufft-cu12==11.2.1.3 (from torch->routellm[eval,serve])\n",
            "  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-curand-cu12==10.3.5.147 (from torch->routellm[eval,serve])\n",
            "  Downloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Collecting nvidia-cusolver-cu12==11.6.1.9 (from torch->routellm[eval,serve])\n",
            "  Downloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n",
            "Collecting nvidia-cusparse-cu12==12.3.1.170 (from torch->routellm[eval,serve])\n",
            "  Downloading nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)\n",
            "Requirement already satisfied: nvidia-cusparselt-cu12==0.6.2 in /usr/local/lib/python3.11/dist-packages (from torch->routellm[eval,serve]) (0.6.2)\n",
            "Collecting nvidia-nccl-cu12==2.21.5 (from torch->routellm[eval,serve])\n",
            "  Downloading nvidia_nccl_cu12-2.21.5-py3-none-manylinux2014_x86_64.whl.metadata (1.8 kB)\n",
            "Requirement already satisfied: nvidia-nvtx-cu12==12.4.127 in /usr/local/lib/python3.11/dist-packages (from torch->routellm[eval,serve]) (12.4.127)\n",
            "Collecting nvidia-nvjitlink-cu12==12.4.127 (from torch->routellm[eval,serve])\n",
            "  Downloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)\n",
            "Requirement already satisfied: triton==3.2.0 in /usr/local/lib/python3.11/dist-packages (from torch->routellm[eval,serve]) (3.2.0)\n",
            "Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.11/dist-packages (from torch->routellm[eval,serve]) (1.13.1)\n",
            "Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.11/dist-packages (from sympy==1.13.1->torch->routellm[eval,serve]) (1.3.0)\n",
            "Requirement already satisfied: safetensors>=0.4.3 in /usr/local/lib/python3.11/dist-packages (from transformers->routellm[eval,serve]) (0.6.1)\n",
            "Requirement already satisfied: h11>=0.8 in /usr/local/lib/python3.11/dist-packages (from uvicorn->routellm[eval,serve]) (0.16.0)\n",
            "Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp>=3.10->litellm->routellm[eval,serve]) (2.6.1)\n",
            "Requirement already satisfied: aiosignal>=1.4.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp>=3.10->litellm->routellm[eval,serve]) (1.4.0)\n",
            "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp>=3.10->litellm->routellm[eval,serve]) (25.3.0)\n",
            "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.11/dist-packages (from aiohttp>=3.10->litellm->routellm[eval,serve]) (1.7.0)\n",
            "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.11/dist-packages (from aiohttp>=3.10->litellm->routellm[eval,serve]) (6.6.3)\n",
            "Requirement already satisfied: propcache>=0.2.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp>=3.10->litellm->routellm[eval,serve]) (0.3.2)\n",
            "Requirement already satisfied: yarl<2.0,>=1.17.0 in /usr/local/lib/python3.11/dist-packages (from aiohttp>=3.10->litellm->routellm[eval,serve]) (1.20.1)\n",
            "Requirement already satisfied: idna>=2.8 in /usr/local/lib/python3.11/dist-packages (from anyio<5,>=3.5.0->openai->routellm[eval,serve]) (3.10)\n",
            "Requirement already satisfied: certifi in /usr/local/lib/python3.11/dist-packages (from httpx>=0.23.0->litellm->routellm[eval,serve]) (2025.8.3)\n",
            "Requirement already satisfied: httpcore==1.* in /usr/local/lib/python3.11/dist-packages (from httpx>=0.23.0->litellm->routellm[eval,serve]) (1.0.9)\n",
            "Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /usr/local/lib/python3.11/dist-packages (from huggingface-hub>=0.24.0->datasets->routellm[eval,serve]) (1.1.7)\n",
            "Requirement already satisfied: zipp>=3.20 in /usr/local/lib/python3.11/dist-packages (from importlib-metadata>=6.8.0->litellm->routellm[eval,serve]) (3.23.0)\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.11/dist-packages (from jinja2<4.0.0,>=3.1.2->litellm->routellm[eval,serve]) (3.0.2)\n",
            "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.11/dist-packages (from jsonschema<5.0.0,>=4.22.0->litellm->routellm[eval,serve]) (2025.4.1)\n",
            "Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.11/dist-packages (from jsonschema<5.0.0,>=4.22.0->litellm->routellm[eval,serve]) (0.36.2)\n",
            "Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.11/dist-packages (from jsonschema<5.0.0,>=4.22.0->litellm->routellm[eval,serve]) (0.26.0)\n",
            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.11/dist-packages (from python-dateutil>=2.7->matplotlib->routellm[eval,serve]) (1.17.0)\n",
            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.11/dist-packages (from requests>=2.32.2->datasets->routellm[eval,serve]) (3.4.2)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.11/dist-packages (from requests>=2.32.2->datasets->routellm[eval,serve]) (2.5.0)\n",
            "Requirement already satisfied: setuptools>=18.5 in /usr/local/lib/python3.11/dist-packages (from IPython->sglang->routellm[eval,serve]) (75.2.0)\n",
            "Collecting jedi>=0.16 (from IPython->sglang->routellm[eval,serve])\n",
            "  Downloading jedi-0.19.2-py2.py3-none-any.whl.metadata (22 kB)\n",
            "Requirement already satisfied: decorator in /usr/local/lib/python3.11/dist-packages (from IPython->sglang->routellm[eval,serve]) (4.4.2)\n",
            "Requirement already satisfied: pickleshare in /usr/local/lib/python3.11/dist-packages (from IPython->sglang->routellm[eval,serve]) (0.7.5)\n",
            "Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.11/dist-packages (from IPython->sglang->routellm[eval,serve]) (5.7.1)\n",
            "Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.11/dist-packages (from IPython->sglang->routellm[eval,serve]) (3.0.51)\n",
            "Requirement already satisfied: pygments in /usr/local/lib/python3.11/dist-packages (from IPython->sglang->routellm[eval,serve]) (2.19.2)\n",
            "Requirement already satisfied: backcall in /usr/local/lib/python3.11/dist-packages (from IPython->sglang->routellm[eval,serve]) (0.2.0)\n",
            "Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.11/dist-packages (from IPython->sglang->routellm[eval,serve]) (0.1.7)\n",
            "Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.11/dist-packages (from IPython->sglang->routellm[eval,serve]) (4.9.0)\n",
            "Requirement already satisfied: parso<0.9.0,>=0.8.4 in /usr/local/lib/python3.11/dist-packages (from jedi>=0.16->IPython->sglang->routellm[eval,serve]) (0.8.4)\n",
            "Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.11/dist-packages (from pexpect>4.3->IPython->sglang->routellm[eval,serve]) (0.7.0)\n",
            "Requirement already satisfied: wcwidth in /usr/local/lib/python3.11/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->IPython->sglang->routellm[eval,serve]) (0.2.13)\n",
            "Downloading litellm-1.75.4-py3-none-any.whl (8.9 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m8.9/8.9 MB\u001b[0m \u001b[31m64.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading openai-1.99.6-py3-none-any.whl (786 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m786.3/786.3 kB\u001b[0m \u001b[31m41.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading routellm-0.2.0-py3-none-any.whl (50 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m50.2/50.2 kB\u001b[0m \u001b[31m3.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading sglang-0.4.10.post2-py3-none-any.whl (1.8 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.8/1.8 MB\u001b[0m \u001b[31m64.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading shortuuid-1.0.13-py3-none-any.whl (10 kB)\n",
            "Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl (363.4 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m363.4/363.4 MB\u001b[0m \u001b[31m4.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (13.8 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m13.8/13.8 MB\u001b[0m \u001b[31m63.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (24.6 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m24.6/24.6 MB\u001b[0m \u001b[31m34.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (883 kB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m883.7/883.7 kB\u001b[0m \u001b[31m43.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl (664.8 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m664.8/664.8 MB\u001b[0m \u001b[31m2.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl (211.5 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m211.5/211.5 MB\u001b[0m \u001b[31m5.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_curand_cu12-10.3.5.147-py3-none-manylinux2014_x86_64.whl (56.3 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m56.3/56.3 MB\u001b[0m \u001b[31m11.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_cusolver_cu12-11.6.1.9-py3-none-manylinux2014_x86_64.whl (127.9 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m127.9/127.9 MB\u001b[0m \u001b[31m7.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_cusparse_cu12-12.3.1.170-py3-none-manylinux2014_x86_64.whl (207.5 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m207.5/207.5 MB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_nccl_cu12-2.21.5-py3-none-manylinux2014_x86_64.whl (188.7 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m188.7/188.7 MB\u001b[0m \u001b[31m5.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m21.1/21.1 MB\u001b[0m \u001b[31m85.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hDownloading python_dotenv-1.1.1-py3-none-any.whl (20 kB)\n",
            "Downloading setproctitle-1.3.6-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (31 kB)\n",
            "Downloading jedi-0.19.2-py2.py3-none-any.whl (1.6 MB)\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.6/1.6 MB\u001b[0m \u001b[31m48.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hBuilding wheels for collected packages: pandarallel\n",
            "  Building wheel for pandarallel (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for pandarallel: filename=pandarallel-1.6.5-py3-none-any.whl size=16674 sha256=d1e4e1f7d1e869164dbbc483fa80deb2fec2fd1e08adbd723aedfebcaaf0d428\n",
            "  Stored in directory: /root/.cache/pip/wheels/b9/c6/5a/829298789e94348b81af52ab42c19d49da007306bbcc983827\n",
            "Successfully built pandarallel\n",
            "Installing collected packages: shortuuid, setproctitle, python-dotenv, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, jedi, nvidia-cusparse-cu12, nvidia-cudnn-cu12, sglang, pandarallel, openai, nvidia-cusolver-cu12, litellm, routellm\n",
            "  Attempting uninstall: nvidia-nvjitlink-cu12\n",
            "    Found existing installation: nvidia-nvjitlink-cu12 12.5.82\n",
            "    Uninstalling nvidia-nvjitlink-cu12-12.5.82:\n",
            "      Successfully uninstalled nvidia-nvjitlink-cu12-12.5.82\n",
            "  Attempting uninstall: nvidia-nccl-cu12\n",
            "    Found existing installation: nvidia-nccl-cu12 2.23.4\n",
            "    Uninstalling nvidia-nccl-cu12-2.23.4:\n",
            "      Successfully uninstalled nvidia-nccl-cu12-2.23.4\n",
            "  Attempting uninstall: nvidia-curand-cu12\n",
            "    Found existing installation: nvidia-curand-cu12 10.3.6.82\n",
            "    Uninstalling nvidia-curand-cu12-10.3.6.82:\n",
            "      Successfully uninstalled nvidia-curand-cu12-10.3.6.82\n",
            "  Attempting uninstall: nvidia-cufft-cu12\n",
            "    Found existing installation: nvidia-cufft-cu12 11.2.3.61\n",
            "    Uninstalling nvidia-cufft-cu12-11.2.3.61:\n",
            "      Successfully uninstalled nvidia-cufft-cu12-11.2.3.61\n",
            "  Attempting uninstall: nvidia-cuda-runtime-cu12\n",
            "    Found existing installation: nvidia-cuda-runtime-cu12 12.5.82\n",
            "    Uninstalling nvidia-cuda-runtime-cu12-12.5.82:\n",
            "      Successfully uninstalled nvidia-cuda-runtime-cu12-12.5.82\n",
            "  Attempting uninstall: nvidia-cuda-nvrtc-cu12\n",
            "    Found existing installation: nvidia-cuda-nvrtc-cu12 12.5.82\n",
            "    Uninstalling nvidia-cuda-nvrtc-cu12-12.5.82:\n",
            "      Successfully uninstalled nvidia-cuda-nvrtc-cu12-12.5.82\n",
            "  Attempting uninstall: nvidia-cuda-cupti-cu12\n",
            "    Found existing installation: nvidia-cuda-cupti-cu12 12.5.82\n",
            "    Uninstalling nvidia-cuda-cupti-cu12-12.5.82:\n",
            "      Successfully uninstalled nvidia-cuda-cupti-cu12-12.5.82\n",
            "  Attempting uninstall: nvidia-cublas-cu12\n",
            "    Found existing installation: nvidia-cublas-cu12 12.5.3.2\n",
            "    Uninstalling nvidia-cublas-cu12-12.5.3.2:\n",
            "      Successfully uninstalled nvidia-cublas-cu12-12.5.3.2\n",
            "  Attempting uninstall: nvidia-cusparse-cu12\n",
            "    Found existing installation: nvidia-cusparse-cu12 12.5.1.3\n",
            "    Uninstalling nvidia-cusparse-cu12-12.5.1.3:\n",
            "      Successfully uninstalled nvidia-cusparse-cu12-12.5.1.3\n",
            "  Attempting uninstall: nvidia-cudnn-cu12\n",
            "    Found existing installation: nvidia-cudnn-cu12 9.3.0.75\n",
            "    Uninstalling nvidia-cudnn-cu12-9.3.0.75:\n",
            "      Successfully uninstalled nvidia-cudnn-cu12-9.3.0.75\n",
            "  Attempting uninstall: openai\n",
            "    Found existing installation: openai 1.99.1\n",
            "    Uninstalling openai-1.99.1:\n",
            "      Successfully uninstalled openai-1.99.1\n",
            "  Attempting uninstall: nvidia-cusolver-cu12\n",
            "    Found existing installation: nvidia-cusolver-cu12 11.6.3.83\n",
            "    Uninstalling nvidia-cusolver-cu12-11.6.3.83:\n",
            "      Successfully uninstalled nvidia-cusolver-cu12-11.6.3.83\n",
            "Successfully installed jedi-0.19.2 litellm-1.75.4 nvidia-cublas-cu12-12.4.5.8 nvidia-cuda-cupti-cu12-12.4.127 nvidia-cuda-nvrtc-cu12-12.4.127 nvidia-cuda-runtime-cu12-12.4.127 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.2.1.3 nvidia-curand-cu12-10.3.5.147 nvidia-cusolver-cu12-11.6.1.9 nvidia-cusparse-cu12-12.3.1.170 nvidia-nccl-cu12-2.21.5 nvidia-nvjitlink-cu12-12.4.127 openai-1.99.6 pandarallel-1.6.5 python-dotenv-1.1.1 routellm-0.2.0 setproctitle-1.3.6 sglang-0.4.10.post2 shortuuid-1.0.13\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "application/vnd.colab-display-data+json": {
              "pip_warning": {
                "packages": [
                  "openai"
                ]
              },
              "id": "196422e3a0db4c63b32067235f1bb102"
            }
          },
          "metadata": {}
        }
      ],
      "source": [
        "!pip install \"routellm[serve,eval]\""
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Loading OpenAI API Key\n",
        "To get an OpenAI API key, visit https://platform.openai.com/settings/organization/api-keys and generate a new key. If you’re a new user, you may need to add billing details and make a minimum payment of $5 to activate API access.\n",
        "\n",
        "RouteLLM leverages LiteLLM to support chat completions from a wide range of both open-source and closed-source models. You can check out the list of providers at https://litellm.vercel.app/docs/providers if you want to use some other model."
      ],
      "metadata": {
        "id": "zWk5DS0yXtFz"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "from getpass import getpass\n",
        "os.environ['OPENAI_API_KEY'] = getpass('Enter OpenAI API Key: ')"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "_9W9yMvRYDp2",
        "outputId": "1893d124-ce3b-406b-a6ff-1f7f22c2e9e8"
      },
      "execution_count": 2,
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Enter OpenAI API Key: ··········\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Downloading Config File\n",
        "RouteLLM uses a configuration file to locate pretrained router checkpoints and the datasets they were trained on.\n",
        "This file tells the system where to find the models that decide whether to send a query to the strong or weak model.\n",
        "\n",
        "### Do I need to edit it?\n",
        "For most users — no. The default config already points to well-trained routers (mf, bert, causal_llm) that work out of the box.\n",
        "You only need to change it if you plan to:\n",
        "\n",
        "* Train your own router on a custom dataset.\n",
        "\n",
        "* Replace the routing algorithm entirely with a new one.\n",
        "\n",
        "For this tutorial, we’ll keep the config as is and simply:\n",
        "\n",
        "* Set our strong and weak model names in code.\n",
        "\n",
        "* Add our API keys for the chosen providers.\n",
        "\n",
        "* Use a calibrated threshold to balance cost and quality."
      ],
      "metadata": {
        "id": "Ua_hRhzEX3fk"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!wget https://raw.githubusercontent.com/lm-sys/RouteLLM/main/config.example.yaml"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "wbCwSaVIXH8v",
        "outputId": "532e347d-d739-462c-82de-7aa11e1cb252"
      },
      "execution_count": 50,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "--2025-08-10 14:22:39--  https://raw.githubusercontent.com/lm-sys/RouteLLM/main/config.example.yaml\n",
            "Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...\n",
            "Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 417 [text/plain]\n",
            "Saving to: ‘config.example.yaml’\n",
            "\n",
            "\rconfig.example.yaml   0%[                    ]       0  --.-KB/s               \rconfig.example.yaml 100%[===================>]     417  --.-KB/s    in 0s      \n",
            "\n",
            "2025-08-10 14:22:39 (14.1 MB/s) - ‘config.example.yaml’ saved [417/417]\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Initializing the RouteLLM Controller\n",
        "In this code block, we import the necessary libraries and initialize the RouteLLM Controller, which will manage how prompts are routed between models. We specify routers=[\"mf\"] to use the Matrix Factorization router, a pretrained decision model that predicts whether a query should be sent to the strong or weak model.\n",
        "\n",
        "The strong_model parameter is set to **\"gpt-5**\", a high-quality but more expensive model, while the weak_model parameter is set to **\"o4-mini\"**, a faster and cheaper alternative. For each incoming prompt, the router evaluates its complexity against a threshold and automatically chooses the most cost-effective option—ensuring that simple tasks are handled by the cheaper model while more challenging ones get the stronger model’s capabilities.\n",
        "\n",
        "This configuration allows you to balance cost efficiency and response quality without manual intervention."
      ],
      "metadata": {
        "id": "60dFrRPwYvE5"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "import os\n",
        "import pandas as pd\n",
        "from routellm.controller import Controller\n",
        "\n",
        "client = Controller(\n",
        "    routers=[\"mf\"],  # Model Fusion router\n",
        "    strong_model=\"gpt-5\",\n",
        "    weak_model=\"o4-mini\"\n",
        ")\n"
      ],
      "metadata": {
        "id": "7jTTS92pYmBW"
      },
      "execution_count": 25,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "!python -m routellm.calibrate_threshold --routers mf --strong-model-pct 0.1 --config config.example.yaml"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "yCmv0yPtJ6ME",
        "outputId": "78884eec-f373-4f10-b455-7a7e8db85799"
      },
      "execution_count": 37,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "For 10.0% strong model calls for mf, threshold = 0.24034\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "This command runs RouteLLM’s threshold calibration process for the Matrix Factorization (mf) router. The --strong-model-pct 0.1 argument tells the system to find the threshold value that routes roughly 10% of queries to the strong model (and the rest to the weak model).\n",
        "\n",
        "Using the --config config.example.yaml file for model and router settings, the calibration determined:\n",
        "\n",
        "**For 10% strong model calls with mf, the optimal threshold is 0.24034.**\n",
        "\n",
        "This means that any query with a router-assigned complexity score above 0.24034 will be sent to the strong model, while those below it will go to the weak model, aligning with your desired cost–quality trade-off."
      ],
      "metadata": {
        "id": "FbdKIoekY-JG"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Defining the threshold & prompts variables\n",
        "Here, we define a diverse set of test prompts designed to cover a range of complexity levels.\n",
        "They include simple factual questions (likely to be routed to the weak model), medium reasoning tasks (borderline threshold cases), and high-complexity or creative requests (more suited for the strong model), along with code generation tasks to test technical capabilities."
      ],
      "metadata": {
        "id": "_yxQG2j4ZJMk"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "threshold = 0.24034\n",
        "\n",
        "prompts = [\n",
        "    # Easy factual (likely weak model)\n",
        "    \"Who wrote the novel 'Pride and Prejudice'?\",\n",
        "    \"What is the largest planet in our solar system?\",\n",
        "\n",
        "    # Medium reasoning (borderline cases)\n",
        "    \"If a train leaves at 3 PM and travels 60 km/h, how far will it travel by 6:30 PM?\",\n",
        "    \"Explain why the sky appears blue during the day and red/orange during sunset.\",\n",
        "\n",
        "    # High complexity / creative (likely strong model)\n",
        "    \"Write a 6-line rap verse about climate change using internal rhyme.\",\n",
        "    \"Summarize the differences between supervised, unsupervised, and reinforcement learning with examples.\",\n",
        "\n",
        "    # Code generation\n",
        "    \"Write a Python function to check if a given string is a palindrome, ignoring punctuation and spaces.\",\n",
        "    \"Generate SQL to find the top 3 highest-paying customers from a 'sales' table.\"\n",
        "]\n"
      ],
      "metadata": {
        "id": "l28cTrNiUf9q"
      },
      "execution_count": 45,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Evaluating Win Rate\n",
        "The following code calculates the win rate for each test prompt using the mf router, showing the likelihood that the strong model will outperform the weak model.\n",
        "Based on the calibrated threshold of 0.24034, two prompts —\n",
        "\n",
        "**\"If a train leaves at 3 PM and travels 60 km/h, how far will it travel by 6:30 PM?\"** (0.303087)\n",
        "\n",
        "**\"Write a Python function to check if a given string is a palindrome, ignoring punctuation and spaces.\"** (0.272534)\n",
        "\n",
        "— exceed the threshold and would be routed to the strong model.\n",
        "All other prompts remain below the threshold, meaning they would be served by the weaker, cheaper model."
      ],
      "metadata": {
        "id": "au7bf6LXZkSn"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "win_rates = client.batch_calculate_win_rate(prompts=pd.Series(prompts), router=\"mf\")\n",
        "\n",
        "# Store results in DataFrame\n",
        "_df = pd.DataFrame({\n",
        "    \"Prompt\": prompts,\n",
        "    \"Win_Rate\": win_rates\n",
        "})\n",
        "\n",
        "# Show full text without truncation\n",
        "pd.set_option('display.max_colwidth', None)"
      ],
      "metadata": {
        "id": "wDE0oMJYWmth"
      },
      "execution_count": 47,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "_df"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 300
        },
        "id": "Y3PWJsaSWuKk",
        "outputId": "4787125c-9087-4762-c9fb-8f25ccb80472"
      },
      "execution_count": 48,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                                                                                                  Prompt  \\\n",
              "0                                                             Who wrote the novel 'Pride and Prejudice'?   \n",
              "1                                                        What is the largest planet in our solar system?   \n",
              "2                      If a train leaves at 3 PM and travels 60 km/h, how far will it travel by 6:30 PM?   \n",
              "3                          Explain why the sky appears blue during the day and red/orange during sunset.   \n",
              "4                                    Write a 6-line rap verse about climate change using internal rhyme.   \n",
              "5  Summarize the differences between supervised, unsupervised, and reinforcement learning with examples.   \n",
              "6   Write a Python function to check if a given string is a palindrome, ignoring punctuation and spaces.   \n",
              "7                          Generate SQL to find the top 3 highest-paying customers from a 'sales' table.   \n",
              "\n",
              "   Win_Rate  \n",
              "0  0.175543  \n",
              "1  0.129442  \n",
              "2  0.303087  \n",
              "3  0.084880  \n",
              "4  0.135652  \n",
              "5  0.109009  \n",
              "6  0.272534  \n",
              "7  0.133232  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-27b6d57d-65b7-430c-8fff-7105a583554d\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Prompt</th>\n",
              "      <th>Win_Rate</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Who wrote the novel 'Pride and Prejudice'?</td>\n",
              "      <td>0.175543</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>What is the largest planet in our solar system?</td>\n",
              "      <td>0.129442</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>If a train leaves at 3 PM and travels 60 km/h, how far will it travel by 6:30 PM?</td>\n",
              "      <td>0.303087</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>Explain why the sky appears blue during the day and red/orange during sunset.</td>\n",
              "      <td>0.084880</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>Write a 6-line rap verse about climate change using internal rhyme.</td>\n",
              "      <td>0.135652</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>Summarize the differences between supervised, unsupervised, and reinforcement learning with examples.</td>\n",
              "      <td>0.109009</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>Write a Python function to check if a given string is a palindrome, ignoring punctuation and spaces.</td>\n",
              "      <td>0.272534</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>Generate SQL to find the top 3 highest-paying customers from a 'sales' table.</td>\n",
              "      <td>0.133232</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-27b6d57d-65b7-430c-8fff-7105a583554d')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-27b6d57d-65b7-430c-8fff-7105a583554d button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-27b6d57d-65b7-430c-8fff-7105a583554d');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "    <div id=\"df-078c9e97-ea08-48e3-85a6-25f91ce7e4d0\">\n",
              "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-078c9e97-ea08-48e3-85a6-25f91ce7e4d0')\"\n",
              "                title=\"Suggest charts\"\n",
              "                style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "      </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "      <script>\n",
              "        async function quickchart(key) {\n",
              "          const quickchartButtonEl =\n",
              "            document.querySelector('#' + key + ' button');\n",
              "          quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "          quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "          try {\n",
              "            const charts = await google.colab.kernel.invokeFunction(\n",
              "                'suggestCharts', [key], {});\n",
              "          } catch (error) {\n",
              "            console.error('Error during call to suggestCharts:', error);\n",
              "          }\n",
              "          quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "          quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "        }\n",
              "        (() => {\n",
              "          let quickchartButtonEl =\n",
              "            document.querySelector('#df-078c9e97-ea08-48e3-85a6-25f91ce7e4d0 button');\n",
              "          quickchartButtonEl.style.display =\n",
              "            google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "        })();\n",
              "      </script>\n",
              "    </div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "summary": "{\n  \"name\": \"_df\",\n  \"rows\": 8,\n  \"fields\": [\n    {\n      \"column\": \"Prompt\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 8,\n        \"samples\": [\n          \"What is the largest planet in our solar system?\",\n          \"Summarize the differences between supervised, unsupervised, and reinforcement learning with examples.\",\n          \"Who wrote the novel 'Pride and Prejudice'?\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Win_Rate\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0.07870691259203437,\n        \"min\": 0.08488038927316666,\n        \"max\": 0.3030872642993927,\n        \"num_unique_values\": 8,\n        \"samples\": [\n          0.12944217026233673,\n          0.10900914669036865,\n          0.17554272711277008\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 48
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "These results also help in fine-tuning the routing strategy — by analyzing the win rate distribution, we can adjust the threshold to better balance cost savings and performance."
      ],
      "metadata": {
        "id": "BJduJIvEaYnI"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Routing Prompts Through Calibrated Model Fusion (MF) Router\n",
        "This code iterates over the list of test prompts and sends each one to the RouteLLM controller using the calibrated mf router with the specified threshold (router-mf-{threshold}).\n",
        "\n",
        "For each prompt, the router decides whether to use the strong or weak model based on the calculated win rate.\n",
        "\n",
        "The response includes both the generated output and the actual model that was selected by the router.\n",
        "\n",
        "These details — the prompt, model used, and generated output — are stored in the results list for later analysis."
      ],
      "metadata": {
        "id": "h_wLkBlmZ3Y_"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "results = []\n",
        "for prompt in prompts:\n",
        "    response = client.chat.completions.create(\n",
        "        model=f\"router-mf-{threshold}\",\n",
        "        messages=[{\"role\": \"user\", \"content\": prompt}]\n",
        "    )\n",
        "    message = response.choices[0].message[\"content\"]\n",
        "    model_used = response.model  # RouteLLM returns the model actually used\n",
        "\n",
        "    results.append({\n",
        "        \"Prompt\": prompt,\n",
        "        \"Model Used\": model_used,\n",
        "        \"Output\": message\n",
        "    })\n"
      ],
      "metadata": {
        "id": "a5fT0C1IHHaU"
      },
      "execution_count": 39,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Displaying the Results"
      ],
      "metadata": {
        "id": "9wfsMk0hZ838"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "pd.set_option(\"display.max_columns\", None)\n",
        "pd.set_option(\"display.max_rows\", None)\n",
        "pd.set_option(\"display.max_colwidth\", None)\n",
        "df = pd.DataFrame(results)\n",
        "df"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 960
        },
        "id": "Nywmvg74HR7J",
        "outputId": "82f580f9-c2c3-4e7f-8aaa-7228b657f622"
      },
      "execution_count": 41,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                                                                                                  Prompt  \\\n",
              "0                                                             Who wrote the novel 'Pride and Prejudice'?   \n",
              "1                                                        What is the largest planet in our solar system?   \n",
              "2                      If a train leaves at 3 PM and travels 60 km/h, how far will it travel by 6:30 PM?   \n",
              "3                          Explain why the sky appears blue during the day and red/orange during sunset.   \n",
              "4                                    Write a 6-line rap verse about climate change using internal rhyme.   \n",
              "5  Summarize the differences between supervised, unsupervised, and reinforcement learning with examples.   \n",
              "6   Write a Python function to check if a given string is a palindrome, ignoring punctuation and spaces.   \n",
              "7                          Generate SQL to find the top 3 highest-paying customers from a 'sales' table.   \n",
              "\n",
              "           Model Used  \\\n",
              "0  o4-mini-2025-04-16   \n",
              "1  o4-mini-2025-04-16   \n",
              "2    gpt-5-2025-08-07   \n",
              "3  o4-mini-2025-04-16   \n",
              "4  o4-mini-2025-04-16   \n",
              "5  o4-mini-2025-04-16   \n",
              "6    gpt-5-2025-08-07   \n",
              "7  o4-mini-2025-04-16   \n",
              "\n",
              "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Output  \n",
              "0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       The novel “Pride and Prejudice” was written by the English author Jane Austen. It was first published in 1813 under the original title “First Impressions.”  \n",
              "1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   The largest planet in our solar system is Jupiter.  \\n\\nKey facts about Jupiter:  \\n• Diameter: about 142,984 km (≈11 times that of Earth)  \\n• Mass: roughly 1.90 × 10^27 kg (over 300 times Earth’s mass)  \\n• Composition: primarily hydrogen and helium (a gas giant)  \\n• Notable features: the Great Red Spot (a giant storm), faint ring system, and at least 79 known moons (including Ganymede, the largest moon in the solar system).  \n",
              "2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         210 km\\n\\nCalculation:\\n- Time from 3:00 PM to 6:30 PM = 3.5 hours\\n- Distance = 60 km/h × 3.5 h = 210 km  \n",
              "3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Sunlight is made of all visible colors, but as it passes through Earth’s atmosphere it doesn’t all reach your eyes the same way. Two key effects explain why the sky looks blue by day and red/orange at sunrise or sunset:\\n\\n1. Rayleigh scattering and the blue sky  \\n   • Air molecules (mostly nitrogen and oxygen) are much smaller than the wavelength of visible light.  \\n   • Rayleigh scattering says that the amount of scattering varies inversely with the fourth power of wavelength (∝1/λ⁴).  \\n   • Short (blue, ~450 nm) wavelengths scatter about ten times more than long (red, ~700 nm) wavelengths.  \\n   • When the Sun is high, sunlight travels a relatively short path through air, so blue light is scattered in all directions, filling the sky with that color.  \\n\\n2. Long path length at sunrise/sunset and the reds and oranges  \\n   • Near the horizon, sunlight must traverse a much thicker “slice” of atmosphere.  \\n   • Blue and green light suffer so much scattering out of the direct beam that very little remains on the path to your eye.  \\n   • Red and orange (longer wavelengths) are scattered much less, so more of that light reaches you directly, giving the Sun—and the surrounding sky—a warm reddish hue.  \\n\\n3. Role of aerosols and dust  \\n   • Particles larger than molecules (dust, water droplets, pollution) scatter all wavelengths more equally (Mie scattering), often intensifying reds and pinks at dawn and dusk.  \\n\\nIn summary, daytime blue comes from efficient scattering of short‐wavelength light; sunrise/sunset reds and oranges occur because the longer atmospheric path removes most of the blues and greens, leaving the reds to dominate.  \n",
              "4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Heat beats streets, sweat sweeps through concrete, retreat’s defeat  \\nOceans rising, surprising tides decide how we survive  \\nStorms swarm the norm, warn of harm born to alarm  \\nGreen dreams gleam in clean streams, redeeming seams of schemes  \\nSolar scholars holler, collaring rays, powering brighter days  \\nHands band for the land, stand grand, demand we expand our plan  \n",
              "5  Here’s a concise comparison of the three major paradigms in machine learning:\\n\\n1. Supervised Learning  \\n   • Goal: Learn a mapping from inputs X to outputs Y using labeled examples.  \\n   • Feedback: Direct, per‐example “correct” answers (labels).  \\n   • Common tasks:  \\n     – Classification (e.g. spam vs. non‐spam email)  \\n     – Regression (e.g. predicting house prices)  \\n   • Examples:  \\n     – Image-classification networks trained on photos labeled “cat” or “dog.”  \\n     – A model that predicts tomorrow’s temperature from historical weather data.  \\n\\n2. Unsupervised Learning  \\n   • Goal: Discover hidden structure or patterns in unlabeled data.  \\n   • Feedback: None (no explicit labels).  \\n   • Common tasks:  \\n     – Clustering (e.g. segmenting customers into market‐segments via k-means)  \\n     – Dimensionality reduction (e.g. PCA for feature compression)  \\n     – Anomaly detection (e.g. flagging credit-card fraud)  \\n   • Examples:  \\n     – Grouping similar news articles by topic when you have no topic labels.  \\n     – Reducing the number of features to visualize high-dimensional data.  \\n\\n3. Reinforcement Learning (RL)  \\n   • Goal: Learn a policy that maximizes cumulative reward in an environment.  \\n   • Feedback: Scalar reward signal (often delayed), no direct “correct” action.  \\n   • Common tasks:  \\n     – Control (e.g. robotic arm manipulation)  \\n     – Game playing (e.g. AlphaZero in chess and Go)  \\n     – Resource management (e.g. dynamic pricing, traffic‐signal control)  \\n   • Examples:  \\n     – An agent learning to play Pong by trial and error, receiving +1 for winning a point and –1 for losing one.  \\n     – A self-driving car learning to navigate safely through rewards for staying on the road and penalties for collisions.  \\n\\nKey distinctions at a glance:  \\n• Data: supervised uses labeled data; unsupervised uses unlabeled; RL interacts with an environment.  \\n• Feedback: supervised gets exact labels; unsupervised gets none; RL gets only reward signals.  \\n• Objective: supervised predicts outputs; unsupervised finds structure; RL discovers optimal actions over time.  \n",
              "6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          def is_palindrome(text: str) -> bool:\\n    \"\"\"\\n    Return True if text is a palindrome, ignoring punctuation and spaces (case-insensitive).\\n    \"\"\"\\n    filtered = [ch.lower() for ch in text if ch.isalnum()]\\n    return filtered == filtered[::-1]  \n",
              "7                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Here’s the simplest version in MySQL (and Postgres) – sum each customer’s spend, sort descending, then LIMIT 3:\\n\\n```sql\\nSELECT\\n  customer_id,\\n  SUM(amount) AS total_spent\\nFROM sales\\nGROUP BY customer_id\\nORDER BY total_spent DESC\\nLIMIT 3;\\n```\\n\\n————————————————————————————\\n\\nIf you’re on SQL Server:\\n\\n```sql\\nSELECT TOP 3\\n  customer_id,\\n  SUM(amount) AS total_spent\\nFROM sales\\nGROUP BY customer_id\\nORDER BY total_spent DESC;\\n```\\n\\n————————————————————————————\\n\\nIn Oracle 12c+ (or any DB that supports ANSI FETCH):\\n\\n```sql\\nSELECT\\n  customer_id,\\n  SUM(amount) AS total_spent\\nFROM sales\\nGROUP BY customer_id\\nORDER BY total_spent DESC\\nFETCH FIRST 3 ROWS ONLY;\\n```\\n\\nIn pre-12c Oracle you can wrap in a subquery:\\n\\n```sql\\nSELECT customer_id, total_spent\\nFROM (\\n  SELECT\\n    customer_id,\\n    SUM(amount) AS total_spent\\n  FROM sales\\n  GROUP BY customer_id\\n  ORDER BY total_spent DESC\\n)\\nWHERE ROWNUM <= 3;\\n```\\n\\n————————————————————————————\\n\\nIf you need to handle ties or want an analytic-function approach (works in most engines):\\n\\n```sql\\nSELECT customer_id, total_spent\\nFROM (\\n  SELECT\\n    customer_id,\\n    SUM(amount) AS total_spent,\\n    ROW_NUMBER() OVER (ORDER BY SUM(amount) DESC) AS rn\\n  FROM sales\\n  GROUP BY customer_id\\n) t\\nWHERE rn <= 3;\\n```  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-61d03912-fbad-49b2-bf91-4778aeceea8a\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Prompt</th>\n",
              "      <th>Model Used</th>\n",
              "      <th>Output</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Who wrote the novel 'Pride and Prejudice'?</td>\n",
              "      <td>o4-mini-2025-04-16</td>\n",
              "      <td>The novel “Pride and Prejudice” was written by the English author Jane Austen. It was first published in 1813 under the original title “First Impressions.”</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>What is the largest planet in our solar system?</td>\n",
              "      <td>o4-mini-2025-04-16</td>\n",
              "      <td>The largest planet in our solar system is Jupiter.  \\n\\nKey facts about Jupiter:  \\n• Diameter: about 142,984 km (≈11 times that of Earth)  \\n• Mass: roughly 1.90 × 10^27 kg (over 300 times Earth’s mass)  \\n• Composition: primarily hydrogen and helium (a gas giant)  \\n• Notable features: the Great Red Spot (a giant storm), faint ring system, and at least 79 known moons (including Ganymede, the largest moon in the solar system).</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>If a train leaves at 3 PM and travels 60 km/h, how far will it travel by 6:30 PM?</td>\n",
              "      <td>gpt-5-2025-08-07</td>\n",
              "      <td>210 km\\n\\nCalculation:\\n- Time from 3:00 PM to 6:30 PM = 3.5 hours\\n- Distance = 60 km/h × 3.5 h = 210 km</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>Explain why the sky appears blue during the day and red/orange during sunset.</td>\n",
              "      <td>o4-mini-2025-04-16</td>\n",
              "      <td>Sunlight is made of all visible colors, but as it passes through Earth’s atmosphere it doesn’t all reach your eyes the same way. Two key effects explain why the sky looks blue by day and red/orange at sunrise or sunset:\\n\\n1. Rayleigh scattering and the blue sky  \\n   • Air molecules (mostly nitrogen and oxygen) are much smaller than the wavelength of visible light.  \\n   • Rayleigh scattering says that the amount of scattering varies inversely with the fourth power of wavelength (∝1/λ⁴).  \\n   • Short (blue, ~450 nm) wavelengths scatter about ten times more than long (red, ~700 nm) wavelengths.  \\n   • When the Sun is high, sunlight travels a relatively short path through air, so blue light is scattered in all directions, filling the sky with that color.  \\n\\n2. Long path length at sunrise/sunset and the reds and oranges  \\n   • Near the horizon, sunlight must traverse a much thicker “slice” of atmosphere.  \\n   • Blue and green light suffer so much scattering out of the direct beam that very little remains on the path to your eye.  \\n   • Red and orange (longer wavelengths) are scattered much less, so more of that light reaches you directly, giving the Sun—and the surrounding sky—a warm reddish hue.  \\n\\n3. Role of aerosols and dust  \\n   • Particles larger than molecules (dust, water droplets, pollution) scatter all wavelengths more equally (Mie scattering), often intensifying reds and pinks at dawn and dusk.  \\n\\nIn summary, daytime blue comes from efficient scattering of short‐wavelength light; sunrise/sunset reds and oranges occur because the longer atmospheric path removes most of the blues and greens, leaving the reds to dominate.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>Write a 6-line rap verse about climate change using internal rhyme.</td>\n",
              "      <td>o4-mini-2025-04-16</td>\n",
              "      <td>Heat beats streets, sweat sweeps through concrete, retreat’s defeat  \\nOceans rising, surprising tides decide how we survive  \\nStorms swarm the norm, warn of harm born to alarm  \\nGreen dreams gleam in clean streams, redeeming seams of schemes  \\nSolar scholars holler, collaring rays, powering brighter days  \\nHands band for the land, stand grand, demand we expand our plan</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>5</th>\n",
              "      <td>Summarize the differences between supervised, unsupervised, and reinforcement learning with examples.</td>\n",
              "      <td>o4-mini-2025-04-16</td>\n",
              "      <td>Here’s a concise comparison of the three major paradigms in machine learning:\\n\\n1. Supervised Learning  \\n   • Goal: Learn a mapping from inputs X to outputs Y using labeled examples.  \\n   • Feedback: Direct, per‐example “correct” answers (labels).  \\n   • Common tasks:  \\n     – Classification (e.g. spam vs. non‐spam email)  \\n     – Regression (e.g. predicting house prices)  \\n   • Examples:  \\n     – Image-classification networks trained on photos labeled “cat” or “dog.”  \\n     – A model that predicts tomorrow’s temperature from historical weather data.  \\n\\n2. Unsupervised Learning  \\n   • Goal: Discover hidden structure or patterns in unlabeled data.  \\n   • Feedback: None (no explicit labels).  \\n   • Common tasks:  \\n     – Clustering (e.g. segmenting customers into market‐segments via k-means)  \\n     – Dimensionality reduction (e.g. PCA for feature compression)  \\n     – Anomaly detection (e.g. flagging credit-card fraud)  \\n   • Examples:  \\n     – Grouping similar news articles by topic when you have no topic labels.  \\n     – Reducing the number of features to visualize high-dimensional data.  \\n\\n3. Reinforcement Learning (RL)  \\n   • Goal: Learn a policy that maximizes cumulative reward in an environment.  \\n   • Feedback: Scalar reward signal (often delayed), no direct “correct” action.  \\n   • Common tasks:  \\n     – Control (e.g. robotic arm manipulation)  \\n     – Game playing (e.g. AlphaZero in chess and Go)  \\n     – Resource management (e.g. dynamic pricing, traffic‐signal control)  \\n   • Examples:  \\n     – An agent learning to play Pong by trial and error, receiving +1 for winning a point and –1 for losing one.  \\n     – A self-driving car learning to navigate safely through rewards for staying on the road and penalties for collisions.  \\n\\nKey distinctions at a glance:  \\n• Data: supervised uses labeled data; unsupervised uses unlabeled; RL interacts with an environment.  \\n• Feedback: supervised gets exact labels; unsupervised gets none; RL gets only reward signals.  \\n• Objective: supervised predicts outputs; unsupervised finds structure; RL discovers optimal actions over time.</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>6</th>\n",
              "      <td>Write a Python function to check if a given string is a palindrome, ignoring punctuation and spaces.</td>\n",
              "      <td>gpt-5-2025-08-07</td>\n",
              "      <td>def is_palindrome(text: str) -&gt; bool:\\n    \"\"\"\\n    Return True if text is a palindrome, ignoring punctuation and spaces (case-insensitive).\\n    \"\"\"\\n    filtered = [ch.lower() for ch in text if ch.isalnum()]\\n    return filtered == filtered[::-1]</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>7</th>\n",
              "      <td>Generate SQL to find the top 3 highest-paying customers from a 'sales' table.</td>\n",
              "      <td>o4-mini-2025-04-16</td>\n",
              "      <td>Here’s the simplest version in MySQL (and Postgres) – sum each customer’s spend, sort descending, then LIMIT 3:\\n\\n```sql\\nSELECT\\n  customer_id,\\n  SUM(amount) AS total_spent\\nFROM sales\\nGROUP BY customer_id\\nORDER BY total_spent DESC\\nLIMIT 3;\\n```\\n\\n————————————————————————————\\n\\nIf you’re on SQL Server:\\n\\n```sql\\nSELECT TOP 3\\n  customer_id,\\n  SUM(amount) AS total_spent\\nFROM sales\\nGROUP BY customer_id\\nORDER BY total_spent DESC;\\n```\\n\\n————————————————————————————\\n\\nIn Oracle 12c+ (or any DB that supports ANSI FETCH):\\n\\n```sql\\nSELECT\\n  customer_id,\\n  SUM(amount) AS total_spent\\nFROM sales\\nGROUP BY customer_id\\nORDER BY total_spent DESC\\nFETCH FIRST 3 ROWS ONLY;\\n```\\n\\nIn pre-12c Oracle you can wrap in a subquery:\\n\\n```sql\\nSELECT customer_id, total_spent\\nFROM (\\n  SELECT\\n    customer_id,\\n    SUM(amount) AS total_spent\\n  FROM sales\\n  GROUP BY customer_id\\n  ORDER BY total_spent DESC\\n)\\nWHERE ROWNUM &lt;= 3;\\n```\\n\\n————————————————————————————\\n\\nIf you need to handle ties or want an analytic-function approach (works in most engines):\\n\\n```sql\\nSELECT customer_id, total_spent\\nFROM (\\n  SELECT\\n    customer_id,\\n    SUM(amount) AS total_spent,\\n    ROW_NUMBER() OVER (ORDER BY SUM(amount) DESC) AS rn\\n  FROM sales\\n  GROUP BY customer_id\\n) t\\nWHERE rn &lt;= 3;\\n```</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-61d03912-fbad-49b2-bf91-4778aeceea8a')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-61d03912-fbad-49b2-bf91-4778aeceea8a button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-61d03912-fbad-49b2-bf91-4778aeceea8a');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "    <div id=\"df-c0475337-af1e-4f24-92ac-2ac3358d4d07\">\n",
              "      <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-c0475337-af1e-4f24-92ac-2ac3358d4d07')\"\n",
              "                title=\"Suggest charts\"\n",
              "                style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "      </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "      <script>\n",
              "        async function quickchart(key) {\n",
              "          const quickchartButtonEl =\n",
              "            document.querySelector('#' + key + ' button');\n",
              "          quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "          quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "          try {\n",
              "            const charts = await google.colab.kernel.invokeFunction(\n",
              "                'suggestCharts', [key], {});\n",
              "          } catch (error) {\n",
              "            console.error('Error during call to suggestCharts:', error);\n",
              "          }\n",
              "          quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "          quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "        }\n",
              "        (() => {\n",
              "          let quickchartButtonEl =\n",
              "            document.querySelector('#df-c0475337-af1e-4f24-92ac-2ac3358d4d07 button');\n",
              "          quickchartButtonEl.style.display =\n",
              "            google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "        })();\n",
              "      </script>\n",
              "    </div>\n",
              "\n",
              "  <div id=\"id_759a51bc-9c28-4770-8995-e0741e836f30\">\n",
              "    <style>\n",
              "      .colab-df-generate {\n",
              "        background-color: #E8F0FE;\n",
              "        border: none;\n",
              "        border-radius: 50%;\n",
              "        cursor: pointer;\n",
              "        display: none;\n",
              "        fill: #1967D2;\n",
              "        height: 32px;\n",
              "        padding: 0 0 0 0;\n",
              "        width: 32px;\n",
              "      }\n",
              "\n",
              "      .colab-df-generate:hover {\n",
              "        background-color: #E2EBFA;\n",
              "        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "        fill: #174EA6;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate {\n",
              "        background-color: #3B4455;\n",
              "        fill: #D2E3FC;\n",
              "      }\n",
              "\n",
              "      [theme=dark] .colab-df-generate:hover {\n",
              "        background-color: #434B5C;\n",
              "        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "        fill: #FFFFFF;\n",
              "      }\n",
              "    </style>\n",
              "    <button class=\"colab-df-generate\" onclick=\"generateWithVariable('df')\"\n",
              "            title=\"Generate code using this dataframe.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "       width=\"24px\">\n",
              "    <path d=\"M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "    <script>\n",
              "      (() => {\n",
              "      const buttonEl =\n",
              "        document.querySelector('#id_759a51bc-9c28-4770-8995-e0741e836f30 button.colab-df-generate');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      buttonEl.onclick = () => {\n",
              "        google.colab.notebook.generateWithVariable('df');\n",
              "      }\n",
              "      })();\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df",
              "summary": "{\n  \"name\": \"df\",\n  \"rows\": 8,\n  \"fields\": [\n    {\n      \"column\": \"Prompt\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 8,\n        \"samples\": [\n          \"What is the largest planet in our solar system?\",\n          \"Summarize the differences between supervised, unsupervised, and reinforcement learning with examples.\",\n          \"Who wrote the novel 'Pride and Prejudice'?\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Model Used\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"gpt-5-2025-08-07\",\n          \"o4-mini-2025-04-16\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"Output\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 8,\n        \"samples\": [\n          \"The largest planet in our solar system is Jupiter.  \\n\\nKey facts about Jupiter:  \\n\\u2022 Diameter: about 142,984 km (\\u224811 times that of Earth)  \\n\\u2022 Mass: roughly 1.90 \\u00d7 10^27 kg (over 300 times Earth\\u2019s mass)  \\n\\u2022 Composition: primarily hydrogen and helium (a gas giant)  \\n\\u2022 Notable features: the Great Red Spot (a giant storm), faint ring system, and at least 79 known moons (including Ganymede, the largest moon in the solar system).\",\n          \"Here\\u2019s a concise comparison of the three major paradigms in machine learning:\\n\\n1. Supervised Learning  \\n   \\u2022 Goal: Learn a mapping from inputs X to outputs Y using labeled examples.  \\n   \\u2022 Feedback: Direct, per\\u2010example \\u201ccorrect\\u201d answers (labels).  \\n   \\u2022 Common tasks:  \\n     \\u2013 Classification (e.g. spam vs. non\\u2010spam email)  \\n     \\u2013 Regression (e.g. predicting house prices)  \\n   \\u2022 Examples:  \\n     \\u2013 Image-classification networks trained on photos labeled \\u201ccat\\u201d or \\u201cdog.\\u201d  \\n     \\u2013 A model that predicts tomorrow\\u2019s temperature from historical weather data.  \\n\\n2. Unsupervised Learning  \\n   \\u2022 Goal: Discover hidden structure or patterns in unlabeled data.  \\n   \\u2022 Feedback: None (no explicit labels).  \\n   \\u2022 Common tasks:  \\n     \\u2013 Clustering (e.g. segmenting customers into market\\u2010segments via k-means)  \\n     \\u2013 Dimensionality reduction (e.g. PCA for feature compression)  \\n     \\u2013 Anomaly detection (e.g. flagging credit-card fraud)  \\n   \\u2022 Examples:  \\n     \\u2013 Grouping similar news articles by topic when you have no topic labels.  \\n     \\u2013 Reducing the number of features to visualize high-dimensional data.  \\n\\n3. Reinforcement Learning (RL)  \\n   \\u2022 Goal: Learn a policy that maximizes cumulative reward in an environment.  \\n   \\u2022 Feedback: Scalar reward signal (often delayed), no direct \\u201ccorrect\\u201d action.  \\n   \\u2022 Common tasks:  \\n     \\u2013 Control (e.g. robotic arm manipulation)  \\n     \\u2013 Game playing (e.g. AlphaZero in chess and Go)  \\n     \\u2013 Resource management (e.g. dynamic pricing, traffic\\u2010signal control)  \\n   \\u2022 Examples:  \\n     \\u2013 An agent learning to play Pong by trial and error, receiving +1 for winning a point and \\u20131 for losing one.  \\n     \\u2013 A self-driving car learning to navigate safely through rewards for staying on the road and penalties for collisions.  \\n\\nKey distinctions at a glance:  \\n\\u2022 Data: supervised uses labeled data; unsupervised uses unlabeled; RL interacts with an environment.  \\n\\u2022 Feedback: supervised gets exact labels; unsupervised gets none; RL gets only reward signals.  \\n\\u2022 Objective: supervised predicts outputs; unsupervised finds structure; RL discovers optimal actions over time.\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 41
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "In the results, prompts 2 and 6 exceeded the threshold win rate and were therefore routed to the gpt-5 strong model, while the rest were handled by the weaker model."
      ],
      "metadata": {
        "id": "qIZlRmDeZ_HA"
      }
    }
  ]
}